Mining Sentiment Classification from Political Web Logs

نویسندگان

  • Kathleen T. Durant
  • Michael D. Smith
چکیده

Over the last few years the number of web logs and amount of opinionated data on the World Wide Web has grown dramatically. Web logs allow people to share their opinions on a wide range of “hot” topics with virtual communities. As readers start turning to web logs as a source of information, automatic techniques that identify the sentiment of web log posts will help bloggers categorize and filter this exploding information source. However, on the surface, sentiment classification of web log posts, in particular political web log posts, appears to be a more difficult problem than classification of traditional text because of the interplay among the images, hyperlinks, the style of writing and language used within web logs. In this paper we investigate existing technology and their utility for sentiment classification on web log posts. We show that a Naïve Bayes classifier can on average correctly predict a posting’s political category 78.06% of the time with a standard deviation of 2.39. It significantly outperforms Support Vector Machines at the 99.9% confidence level with a confidence interval of [1.425, 3.488]. On average, SVMs correctly predicted the category of web log posts 75.47% of the time with a standard deviation of 2.64. Previous research was able to achieve an 81.0% accuracy using Naïve Bayes and 82.9% using SVMs using our chosen feature set representation on a nonspecific topic corpus [14]. Using our dataset of political web logs over a two-year period, we also show that it is important to maintain a uniform distribution in such datasets to avoid biases in classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Time on the Accuracy of Sentiment Classifiers Created from a Web Log Corpus

We investigate the impact of time on the predictability of sentiment classification research for models created from web logs. We show that sentiment classifiers are time dependent and through a series of methodical experiments quantify the size of the dependence. In particular, we measure the accuracies of 25 different time-specific sentiment classifiers on 24 different testing timeframes. We ...

متن کامل

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection

As the number of web logs dramatically grows, readers are turning to them as an important source of information. Automatic techniques that identify the political sentiment of web log posts will help bloggers categorize and filter this exploding information source. In this paper we illustrate the effectiveness of supervised learning for sentiment classification on web log posts. We show that a N...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

Mining Interesting Aspects of a Product using Aspect-based Opinion Mining from Product Reviews (RESEARCH NOTE)

As the internet and its applications are growing, E-commerce has become one of its rapid applications. Customers of E-commerce were provided with the opportunity to express their opinion about the product on the web as a text in the form of reviews. In the previous studies, mere founding sentiment from reviews was not helpful to get the exact opinion of the review. In this paper, we have used A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006